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Abstract — We present a converged algorithm for Tikhonov 
regularized nonnegative matrix factorization (NMF). We specially 
choose this regularization because it is known that Tikhonov 
regularized least square (LS) is the more preferable form in 
solving linear inverse problems than the conventional LS. Because 
an NMF problem can be decomposed into LS subproblems, 
it can be expected that Tikhonov regularized NMF will be 
the more appropriate approach in solving NMF problems. The 
algorithm is derived using additive update rules which have been 
shown to have convergence guarantee. We equip the algorithm 
with a mechanism to automatically determine the regularization 
parameters based on the L-curve, a well-known concept in the 
inverse problems community, but is rather unknown in the NMF 
research. The introduction of this algorithm thus solves two 
inherent problems in Tikhonov regularized NMF algorithm re- 
search, i.e., convergence guarantee and regularization parameters 
determination. 

Index Terms — converged algorithm, inverse problems, L-curve, 
nonnegative matrix factorization, Tikhonov regularization. 

I. Introduction 

THE nonnegative matrix factorization (NMF) is a tech- 
nique that decomposes a nonnegative matrix into a pair 
of other nonnegative matrices. Given a nonnegative matrix A, 
the NMF seeks to find two nonnegative matrices B and C 
such that: 

AwBC, (1) 

where A e R A + IxN = [ai,...,a w ], B e R A + IxR = 
[bi,...,b R ], C G R RxN = [ci,...,cjv], R denotes the 
number of factors which usually is chosen so that R <C 
min(M, N), and R^ /xAr denotes M by N matrix with non- 
negative entries. The conventional method in computing B 
and C is by minimizing the distance between A and BC in 
Frobenius norm, 

minJ(B,C) = -||A-BC||| s.t. B > 0, C > 0. (2) 

In addition, other criteria like Kullback-Leibler divergence (T), 
|f2) and Csiszars (^-divergence Q can also be used. 

Previously, the NMF was studied under the term positive 
matrix factorization by Paatero et al. 0, El - The popularity of 
the NMF is due to the work of Lee and Seung f6] in which they 
introduced a simple yet powerful NMF algorithm, and then 
show its applicability in image processing and text analysis. 
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In addition, the algorithm also produces sparser factors (thus 
requires less storage) J6]-||8] and can give more intuitive 
results compared to other subspace approximation techniques 
like Principal Component Analysis (PCA) and Independent 
Component Analysis (ICA) (6), J7), |9). Due to these reasons, 
many works explored the possibility of applying the NMF in 
some problem domains, e.g., document clustering iflOl . ifTTI . 
spectral analysis IPT21 . lfT3l . image processing Q, E, blind 
source separation |fl"4l . and cancer detection [TT31 — fTT/71 . and 
showed that the NMF can give better results. 

Recently, various works have been conducted to extend the 
standard NMF formulation (eq. O to also include auxiliary 
constraints such as sparseness Q, J8], lfl6l . IfTTI . smoothness 
iTTOl. ITT21. Ifl3l. and orthogonality (19)-|24|. These constraints 
are usually formulated based on inherent properties of the 
data and prior knowledge about the applications, so that com- 
puted solutions can be directed to have desired characteristics. 
Algorithms for solving the problems are mainly based on 
multiplicative update rules algorithms. This is due to the 
convenience of deriving algorithm directly from corresponding 
objective function. However, as multiplicative update rules 
based NMF algorithms do not have convergence guarantee 
J3, ifTHl . [fT9l . l25l . the development of converged algorithms 
for various NMF objectives with auxiliary constraints is an 
open research problem. Note that even though some alternating 
nonnegativity-constrained least square based NMF algorithms 
(e.g., projected gradient methods fl25l , 11271 , projected quasi- 
Newton method |28l , active set method 11291 , and block princi- 
pal pivoting method fl30l ) do have convergence guarantee, due 
to the complexity of the algorithms, it's not always clear how 
to incorporate those auxiliary constraints into the algorithms. 

In this work, we propose a converged algorithm for 
Tikhonov regularized NMF using additive update rules. The 
additive update rules based algorithm for standard NMF first 
appeared in the work of Lee & Seung JTJ, but the convergence 
proof was given by Lin in ref. ITT81 . As in the multiplicative 
update rules version, the additive update rules based algorithm 
can also be derived directly from corresponding NMF objec- 
tive, thus providing a convenient way in deriving converged 
algorithms for various NMF objectives. 

We choose Tikhonov regularization as the auxiliary con- 
straint because this constraint has been used in many applica- 
tions, e.g., text mining 1101 . spectral data analysis 0~2), |[l"3ll , 
microarray data analysis (29], and cancer class discovery [fl6l 
(in some works, sparseness is enforced using norm on 
the solution, i.e., the Tikhonov regularization, instead of L\ 
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norm — the more appropriate constraint for enforcing sparse - 
ness 1 3 j 8 ). and showed that it can offer better results compared 
to the results of standard NMF. This constraint also can reduce 
influence of noise and other uncertainties in the data fl9l , 
02), 021 . In addition, from the inverse problem study, it is 
known that Tikhonov regularized least square (LS) is the more 
preferable form in solving inverse problems because solutions 
of the conventional LS tend to be unstable and dominated by 
data and rounding errors Il32l - ll34l . Moreover, in the presence 
of noise, frequently the conventional LS solutions are rather 
undesirable as it leads to amplification of noise in the direction 
of singular vectors with small singular values [35 1. Since LS 
is the building block of the NMF, 

N 

||A-BC||| = ^|[a„-Bc n |||, (3) 

n=l 

then it can be expected that Tikhonov regularized NMF will 
be the more appropriate approach to solving NMF problem in 

Introducing Tikhonov regularization into the NMF brings 
the issue of how to properly determine the regularization 
parameters. From the inverse problems study it is known that 
the effectiveness of regularization methods depends strongly 
on the parameters; too much regularization creates a loss of 
information, and too little regularization leads to a solution that 
is dominated by noise components and has similar problems 
as in the unregularized solutions. There are two methods 
that are usually be used in determining an appropriate value 
for the regularization parameter: the L-curve and Morozov 
discrepancy principles. In this paper we will utilize the L-curve 
since the Morozov discrepancy principles require knowledge 
of the error level in the data which is often inaccessible 11361 . 

The L-curve is a graphical tool that displays the trade- 
off between approximation error and solution size as the 
regularization parameter varies. In this curve, the proper value 
for the regularization parameter is the value associated with 
corner of the curve where both solution and approximation 
error have minimum norms 11321 . Il34l . There are some methods 
proposed to find this L-corner, e.g., J32], fl34], (37], ED. We 
will use a method proposed by Oraintara et al. in ref. ll34l 
in which they defined L-corner to be the point of tangency 
between L-curve with positive curvature and a straight line of 
negative slope. Fig. Q] shows such condition for L-corner. We 
choose this method because it has convergence guarantee and 
is relatively faster to compute than the standard method; the 
maximum curvature approach 1132) . 

II. Tikhonov Regularized NMF 

Tikhonov regularization is a method for regularizing solu- 
tions of linear inverse problems in order to enhance stability of 
the solutions and reduce the observational errors. The method 
was developed independently by Phillips [3_9'J and Tikhonov 
KOI . In statistic it is also known as ridge regression. Because 
Tikhonov regularization can smooth the solutions, it is often 
used as a smoothness constraint. 

Given a linear inverse problem, 

y = Ax (4) 




Approximation error 



Fig. 1. The generic L-curve with positive curvature; a is the L-comer. 

where x G R M denotes unknown vector to be estimated, 
y G M. N denotes observation data, and A G R NxM denotes 
distortion matrix. The classical approach is to use standard LS 
approach to estimate x, 

x = argmin ||y - Ax|||. (5) 

X 

To improve the solution, usually Tikhonov regularized LS is 
used instead 1321 : 

x A = argmin||y- Ax||| + A||x||| (6) 

X 

where A denotes nonnegative regularization parameter, ||y — 
Ax||f, denotes approximation error, and ||x||f, denotes solu- 
tion size. 

In l34l . Oraintara et al. proposed an iterative algorithm to 
compute A based on the L-curve criterion depicted in fig. [T] 
In summary, x and A can be computed using algorithm [Tj 



Algorithm 1 Iterative algorithm for computing x and A. 

Initialize x (0) and A (0) . 

Set k <- 

repeat 

k <- k + 1 

x (*0 <-argmin||y- Ax||£. + A^" 1 ) ||x||| (7) 

|| x (fe) _ X (*-D|| 



until error < e 



where 7 is the slope of the straight line that is tangent to the L- 
curve and e is a small nonnegative number that is set to 0.001 
in the authors' work ll34l . Note that the value of 7 doesn't 
influence convergence property of sequence x^ fc ' and \( k \ and 
as long as A^ ^ is sufficiently small then converges to a 
stationary point 1341 . The similar method for computing A can 
also be found in ref. 11331 , but the authors fixed 7 value to one. 

We will now derive formulation for Tikhonov regularized 
NMF. The NMF problem in eq. [2] is known to be nonconvex 
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and may have several local mimima |25l . The common 
practice to deal with the nonconvexity of an optimization 
problem is by transforming it into convex subproblems iBTI . 
In the case of the NMF, this can be done by employing the 
alternating strategy; fixing one matrix while solving for the 
other 1251 (apparently, all NMF algorithms utilizing alternating 
strategy). This strategy transforms an NMF problem into a pair 
of convex subproblems. The following equations give convex 
subproblems of the NMF, 



B =argmin-||A-BC||p 
b B>o2" UF 

C = argmin-||A-BC||i 
6 c>o2" " F 



(9) 



(10) 



Alternatingly solving eq.|9]for B and eq.[l0]for C is known as 
alternating nonnegativity-constrained LS (ANLS), and usually 
each of these subproblems is solved by decomposing it into 
a series of corresponding nonnegativity-constraint LS (NNLS) 
problems. The following equations are the NNLS versions of 
eq. |9] and eq. [10] 

1 



TiT i|2 



b m = arg min -||o m - C b... 

m >0 Z 

c„ = arg min -||a„ - Bc n |||, Vn, 

c Tl >0 Z 



Vm 



(ID 



(12) 



where b rn and a m denotes the m-th row of B and A 
respectively. 

As shown, each of these NNLS problems is exactly the 
standard LS problem with additional nonnegativity constraint. 
Accordingly, Tikhonov regularization can be employed to 
improve the solutions. The following equations give Tikhonov 
regularized version of the above NNLS problems. 



f>m = ai "g min ^|| 

b m >0 Z 



arg mm - a* 

c„ >G Z 



- C 1 b 



T U T ||2 
m\\F 



/? ro ||b£|||Vm, (13) 



Bc„ 



1 



a„||c n ||^, Vn, (14) 



where a n and (3 m denotes the corresponding nonnegative 
regularization parameters. By rearranging rows of B and 
columns of C back, Tikhonov regularized NMF can be written 
as: 

1 



B 



argrmn|||A-BC||| + i||v^B|||, (15) 



1 



C = arg min -II A - BC 
6 c>o 2" 



1 



2 



(16) 



where /3 = diag(/?i, . . . , /3m) and a = diag(ai, . . . , ajv). 

The following gives a generic algorithm for Tikhonov 
regularized NMF where update rules for a n and f3 m are 
derived based on the work of Oraintara et al. 11341 , 7^ and 7^ 
are defined similarly as in algorithm [TJ and e denotes small 
positive number. 

III. A CONVERGED ALGORITHM FOR TIKHONOV 
REGULARIZED NMF 

We will now present a converged algorithm for Tikhonov 
regularized NMF based on additive update rules. By combin- 
ing update rules for B and C in eq. Q3] and eq. [16] we define 



Algorithm 2 A generic algorithm for Tikhonov regularized 
NMF. 

Initialize B^ ', C< ', ct(°), and /3 (0) . 

Set k <- 

repeat 

B^^argminillA-BCWlll + ill^Blll 
C (*H-i) ^ arg min i||A - B( fc+1 )C|||, + I||CV^ (fc) |||- 



'oo 2' 



r'm ^ I /: 



B\ ll"rn 



T -c( fc ) T b r ( „ fc+1)T|12 



7 ' 2 

F 



,(fc+l)T,|2 



Vm 



Ha _ T{( k + 1 ) r^ k+1 ' ) \\ 2 
a (k+l) LClFn & Cn ILf y 



,(fe+l)||2 



k <- k + 1 
until 



max (V B J(B) B) < e & max (V c J(C) C) < e 



objective function for Tikhonov regularized NMF as follows: 



min J(B,C) = i||A - BC|||, + ±\\^B\\ 2 F + i||CVS|| 



B,C 



s.t. B > 0,C> 0. 



(17) 



The Karush-Kuhn-Tucker (KKT) function of the objective can 
be written as: 

L(B, C) = J(B, C) - tr (r B B T ) - tr (r c C) . 



where T B e M x and T c e M x denotes the KKT 
multipliers. Partial derivatives of L with respect to B and C 
are: 

V B i(B) = V B J(B)-r B , 

v c £(c) = VcJ(c)-r£, 

with 

V B J(B) = BCC T - AC T + /3B, 
V c J(C) = B T BC - B T A + Ca. 

By results from optimization studies, (B*, C*) is a stationary 
point of eq. [T7] if it satisfies the KKT optimality conditions 

ma, i.e., 

B* > 0, C* > 0, (18) 

v B j(B*) = r B >o, v c J(c*) = rg > o, (19) 

V B J(B*)0B* = 0, V C J(C*)0C* = O ) (20) 

where denotes Hadamard products (component-wise mul- 
tiplications), and eq. |20] is known as the complementary 
slackness. 
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Multiplicative update rules based algorithm for Tikhonov 
regularized NMF can be derived by utilizing the complemen- 
tary slackness: 

(BCC T AC T + f3B) B = 0, 
(B T BC - B T A + Cot) C = 0. 



These equations lead to the following update rules: 

(AC T ) 



(BCC T + /3B) ? 

(b t a) 

V / rn 

(B T BC + Cot) 



Vm, r 



Vr, n 



(21) 



(22) 



where b mr and c rn denote (m, r) entry of B and (r, n) entry 
of C respectively. 

As stated by Lin ifTHl , the above multiplicative update rules 
based algorithm can be modified into an equivalent converged 
algorithm by (1) using additive update rules, and (2) replacing 
zero entries that do not satisfy the KKT conditions with a small 
positive number to escape the zero locking. 

The additive update rules version of the algorithm can be 
written as: 

-VB*/(B) mr , 



BCC J 



(3B) 



c rr 



B T BC + Ca 



-VcJ(C), 



By inspection it is clear that this algorithm also suffers from 
the zero locking, i.e.: 

V B J(B) mr < & b mr = 0, or 
V c J(C) rn < & c rn = 0, 

such that when these conditions happened, the algorithm can 
no longer update the corresponding entries even though those 
entries haven't satisfied the KKT optimality conditions in 
eq.GH 

Algorithm [3] gives necessary modifications to avoid zero 
locking and — as will be shown later — also has convergence 
guarantee, where denotes component-wise division, 



if V B J(B( fc \C( fc )) >0 
max(&«,a) if V B J(B^\ < 



(23) 



(k) 

Cm 



if V c J(B( fe+1 ),CW) >0 

V / rn — 

maxCc&V) if V c J(B( fc+1 ),C( fe )) rn < 



(24) 

denote the modifications to avoid the zero locking with a is 
a small positive number, 5b and <5c denote small positive 
numbers that introduced to avoid division by zeros, B and 
C denote matrices that contain b mr and c rn respectively, and 

V B J(B«, C< fc >) = BWC(*>C« T - AC (fe)T 
+ /3 (fe) B (fc) ; 

V c J(B (fe+1) , C (fc) ) = B (fc+1)T B (fc+1) C ( ' £) - B (k+1)T A 



Algorithm 3 A converged algorithm for Tikhonov regularized 
NMF. 

Initialization B<°) > 0, C(°> > 0, > OVm, and a [ n ] > 

OVn. 

k <- 

repeat 



B (fe+i) ^_ B (fc) _ g(fc) q v B J(B (fc) , C (fe) )0 
^ B ( fc ) c (fc) c (fe)T + /3 (fc) B (fe) +( 5 B ) 

c (fc+i) ^_ C (fc) _ c (k) v c J(B (fe+1) ,C (fc) )G 
( B (fc+i)T B (fe+i) C (fe) + c(k) a (k) + s ^ 



(25) 



(26) 



(fe+i) 



\a T m -C^ T b£ +1)T \\l 

|| ( ,(fc+l)T|| 2 , r 



, Vm (27) 



i(t+VuC| ^rt iVn (28) 



17* 



,(fc+l)||2 



<5c 



fc <- k + 1 
until 



max (V B J(B) B) < e & max (V c J(C) C) < e 



where < step < 1. Note that since algorithm [3] is free from 
the zero locking, B and C can be initialized using nonnegative 
matrices. Theorem Q] explains this formally. 

Theorem 1: If B° > and C° > 0, then B k > and 
C fc > 0, Vfc > 0. And if B° > and C° > 0, then B fc > 
and C k > 0, Vfc > 0. 

Proof: This statement is clear for k = 0, so we need only 
to prove for k > 0. 

Case 1: VsJmr > => b mr = b mr (see b mr definition in 
eq.ED. 

(fc+1) ( B WcWcW T +^BW) mr ^ + feC 
(B( fe )C( fc )C( fc )T +/3 W B (fe)) + <5 B 



(B( fc )C«C( fc ) T - AC^ T + f3^B^) b [k l 

V ' / mr 



(B( fc )C( fc )C( fc ) T + /3 (fc) B( fc )) 
(fe+AC^) mr 6( 



(B( fc )C( fe )C( fc ) T + /3 (fe) B( fe )) mr + <5 B ' 
Thus Vfc > 0, btl > b {k t l) > Vm, r, and b£l > 



L(fe + 1) 



> Vm, r. 



Case 2: V B Jmr < b mr ^ b mr . 

max^.^VBJCBW.CW) 



h (fc+i) = h (fc) 



r (B( fe )C( fc )C( fe ) T + /3 (fc) B( fc )) + «S B ' 

V ' / mr 

Because max (&^,cr) > and V B J(B<» , C^) r < 0, 
then Vfc > 0, b&l > b {k + 1] > Vm, r, and 6$. > => 
btt 1] > Vm, r. 
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Case 3: VcJm > => c r „ = c r „. 



To define G, let's rearrange B into: 



(B(Hi)T B (Hi) C W + C( fc )a( fc )) + (5 C 

V / rn 

( B (fe+l)T B (fe+l) C (fe) _ B (fe+1)T A + C (k) a (k)\ C W 
V / rn 



T _ 



( B (fc+l)T B (fc+l) C (fc) + C( fc )a( 



fc)l 



(fc) 



( B (fc+i)T B (fc+i) C (fc) + cWaW) + ,5c ' 

Thus Vfc > 0, c k n > => c^ +1) > Vr, n, and c k n > => 
c[n +1) > Vr,n. 

Case 4: VcJm < =>• c r „ ^ c r „. 

max^^VcJ^+^CWl 



^rn ^rn 



rB( fe + l ) T B( fc + l )c( fc ) + c( fc ) a (fc)) + <j c ■ 

V / rn 



Because mzx(d$,a) > and V c J(B( fc+1 ) , C<») < 0, 



then Vfc > 0, 4n > c^r ±; > Vr, n, and 4£ > ^> 

Crn +1 ^ > Vr, 71. 

By combining results for k — and fc > in case 1-4, the 
proof is completed. ■ 

Appendix [A] gives Matlab/Octave codes for implementing 
algorithm [3] 



„(*+!) 



„(*) 



IV. Convergence Analysis 

There are two type of update rules in algorithm [3] The first 
is the update rules for B^) and C( fe ), and the second is the 
update rules for /3^ fe ' and cx^ k \ Since the algorithm [3] uses al- 
ternating strategy in updating these variables, the convergence 
analysis can be carried out separately. This approach is known 
as the block-coordinate descent method [42|. 

To derive the convergence guarantee of solution sequence 
{BW, C( k \p( k ', a( fc H, we will first show the convergence 
of sequence {B^, C( fc )}, and then sequence {/3",a^}. 
From convergence analysis study, the following conditions 
must be satisfied for sequence {BW,CW} to have conver- 
gence guarantee HU, E?), ll44l . 

1) The nonincreasing property of sequence jCB^ k \ C^), 
i.e., 

a) j(B( fc+1 )) < j(B( fe )) and 

b) j(c( fc+1 )) < j(c( fe )). 

2) Any limit point of sequence {Bw, C( fe H generated by 
algorithm [3] is a stationary point. 

3) Sequence {B( fe ),C( fe H has at least one limit point. 

A. The nonincreasing property of sequence j(B") 

We will utilize the auxiliary function approach introduced in 
JTj to prove this property. By using the auxiliary function as an 
intermediate function, the nonincreasing property of j(B^ fc ^) 
can be restated with: 

j( B (fc+i)) = G(B( fc+1 \B( fe+1 >) < G(B^,BW) 
<G(BW,B' fc )) = J(BW). 



where b m denotes the m-th row of B. And also let's define: 



V« T j(» fcr ) 



Vb3(BW)[ 



v b 3(bw; 



M. 



where V B 3(B( fe )) m denotes the m-th row of \7 B J(B^) 
B (k) c (k) G (k)T _ lc (fe)T +/3 (fc) B( fc ). Then define, 



D = diag (D 1 , . . . ,D 



where D m denotes a diagonal matrix with its diagonal entries 
defined as: 



^g(fc) C (fc) C (fc)T +( g(fc)g(fc) > ] 



if r G I„ 
if r £ Xn 



with 



l m ee {r\b\ k l > 0, V B J(B (fc) ) mr ^ 0, or 
^ = 0, V B J(BW) mr <0} 

denotes the set of non-KKT indices in m-th row of B< fc ), and 
* is defined so that * = and = 0. 
The auxiliary function © can be defined as: 

e$(S r ,#» r )E3(#) r )+tr {(58 - Q3W)V^3(S#) T )} 

+ itr {(95 - S8(*))D(!8 - & k) ) T }. 

(29) 

Note that 3 and 25 are equivalent to J and G with B is rear- 
ranged into 23 T , and other variables are reordered accordingly. 
And also whenever X^ fc+1 ) is a variable, we remove (fc + 1) 
sign. And: 

V< B T0(<B T , $ 8 (fc)T ) = D(Q3 - <B (fc) ) T + V,b*-3(® ( * )t ). 

By definition D is positive definite for all B' fe ' not satisfy the 
KKT conditions and positive semidefinite if and only if B' fc ) 
satisfies the KKT conditions. Thus ©(Q3 T , ( 8 (fc)T ) is a strict 
convex function, and consequently has a unique minimum, so 
that: 

D(Q3 - <B (fe) ) T + V !8 !r3(B (A ' )T ) = 0, (30) 
rgT = sg(fc)T _ D- 1 V !8 r3(» (fc)T ), 

which is exactly the update rule for B in eq. |251 

Lemma 1: -3(*B T ) can be rewritten as: 

a(» T ) =z{ f & {k)T ) +tr {(»-»«) v S Ta(a3 (fc)T )} 

+ itr {(33 - ®W)V B J(B«) (03 - <8«) T }. 

(31) 
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where 



V|J B* 



'C«C« T + ftI 



C (fc) C (fe)T 



3/A/J 



and I denotes corresponding compatible identity matrix. 
Proof: Let decompose J(B) into: 



J(B) 



1 



|a^-C( fc ) T b^||| + -/3 m ||bf„||| Vm, 



so that J(B) = JIB) 



J B 



AT 



2' 
Then, 



<9J(B) 
<9b m 



and 



a = -a m C( fe)T + b m cWcW T + m b r 



dbi 




By using the Taylor series expansion, j(B) can be rewritten 



J(B) 



which is the m-th row of 3(53 T ). ■ 
To prove the nonincreasing property of J(B( fe '), the fol- 
lowing statements must be shown: 

1) ©(<B T ,Q3 T ) =3(<B T ), 

2) 0(5B feT ,^ T ) =a( ( B feT ), 

3) © <8 T ,<B T ) < ©(<B T ,<B fcT ), and 

4) 0(S T ,<B feT ) < ©(<8 feT ,<8 feT ). 

The first and second will be proven in theorem |2] the third in 
theorem [5] and the fourth in theorem @] 

Theorem 2: ©(<8 T ,<B T ) = 3(<B T ) and ©(Q3« T , <&^ T ) 

= a( $ 8 (fc)T )- 

Proof: These are obvious from the definition of © in 
eq.|29] ■ 

Theorem 3: ©(Q3 T ,<B T ) < ©(<8 T , <BW T ). Moreover if 
and only if B fe satisfies the KKT conditions in eq. 
then 0(<B T ,» T ) =0(® T ,»W T ). 
Proof: 

©(Q3 T , <8 (fc)T ) - ©(Q3 T , <B T ) = 
l -tx (d-V|J(bW)) (23 - <8«) T } 

a 2 j(B) \ , 



i M 

111 = 1 



bl fc) ) D m - 



If D r 



Vm are all positive definite, then the 
inequality always holds except when b TO = b^ Vm, where 
based on theorem Q] and update rule eq. [25] happened if and 
only if the point has reach a stationary point — a point where 
the KKT conditions are satisfied. Thus, it is sufficient to prove 
the positive definiteness of D™ — Vg J(B fc ) m Vm. 



Let = b rn — b^m 7^ 0, then we must prove: 



rp ( 8 2 J(B) \ 

iL D m Vm > o. 



Note that 



dbi 



(b«Xg)) r +5 E 



if r G I„ 
if r ^ I„ 



where denotes the m-th row of B( fc ); and X„ = 
C( fc )C^ fc ^ T + j3 m I and D m are both symmetric matrix. Thus, 



vi 



d 2 J(B) 



dbi 



E 



2 °B 

— 1 L'mr r— 1 *^mr r s— 1 



(x«S 



(fc)T\ 



- - E^ 



E^ 



s=l 



R R 

EE- 



r v s x rs 



r—l 5 — 1 
r — 1 s— 1 



Umr 

(fe) _ 



^EE^ 2 

s— 1 r—l 



TO, 



b (k) 

Urns 




(k) 

where v r denotes the 7--th entry of v m and x rs denotes the 



d 2 j(B) r 



Vm are all 



(r, s) entry of X' fe '. Therefore, D' 
positive definite ■ 
Theorems (<8 T , »( fc > T ) < © (<B« T , «8< fe > T ). More- 
over, if and only if B satisfies the KKT conditions, then 

0(BT i{8 (*)T) =©(Q5W T ,Q5( fe ) T ). 
froo/: 

© (<B« T , <8^ T ) - © (<8 T , »( fe ) T ) = 

-tr {(B-B^V*^®^)} 
--tr {(»-»W)D(»-*B( fc )) T }. 
By using eq. [30j and the fact that D is positive semi-definite: 

© (»<*) T ,B(*> T ) - © (*B T ,»( fc ) T ) = 

-tr {(05- 5B^)D(!8 - Q3 (fe) ) T } > 0, 

we proved that © (*8 T ,<BW T ) < © (<BW T ,!B( fe ) T ). 

Now, let's prove the second part of the theorem. By the 
update rule eq. [25] if B' fe ' satisfies the KKT conditions, then 
B will be equal to BW, and thus the equality holds. Now we 
need to prove that if the equality holds, then B^ satisfies the 
KKT conditions. To prove this, let consider a contradiction 
where the equality holds but BW does not satisfy the KKT 
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conditions. In this case, there exists at least an index (m, r) 
such that: 



# 6<W and <F r 



Note that by the definition in eq 



u mr 



> 



if 6 



£(*) 

0, then it 



satisfies the KKT conditions. Accordingly, b mr = b&% which K The convergence of sequence {£< J ,a W } 



of this sequence is clear by the objective eq. [TTJ if there exists 
I such that \imb l mr — > oo or limcj,„ — > oo, then lim j(B"), 
CW) oo > J(B(°), C(°)) which violates theorem 
With nonnegativity guarantee from theorem Q] it is proven 
that {BW,C( fc >} is in closed and bounded set. ■ 

( fc ) „(fc)i 



violates the condition for contradiction. So, b^ r cannot be 
equal to zero, and thus d™ r is well defined. Consequently, 



e(xW T M k)T )-efa T ,& k)T ) - 



bmr ~ bmr ) <5]3 



> 



which violates the condition for contradiction. Thus, it is 
proven that if the equality holds, then B' fe ' satisfies the KKT 
conditions. ■ 

The following theorem summarizes the nonincreasing prop- 
erty of sequence J (B^ fc ') 

Theorem 5: J (B( fc+1 )) < J (B< fe >) Vfc > under update 
rule eq. [25] with the equality happens if and only if B^ fe ' 
satisfies the KKT optimality conditions in eq. IT8ll20l 

Proof: This is the results of theorem [2] [3] and [4] ■ 

B. The nonincreasing property of sequence j(C^ fe ^) 

Theorem 6: J (C^ 1 )) < J (C^)) Vfc > under update 
rule eq. [26] with the equality happens if and only if C( fe ) 
satisfies the KKT optimality conditions in eq. |T8ll20l 

Proof: This theorem can be proven similarly as in 
j(B( fe) ) case. ■ 

C. The nonincreasing property of sequence j(B' fc \ C^ fc ^) 

Theorem 7: j(B^ k+1 \ C^) < j(B^ k+1 \ C«) < 
j(B < - k \ C^)) Vfc > under update rule eq. [25] and [26] with 
the equality happens if and only if (B' fe \C^ fe ') satisfies the 
KKT optimality conditions in eq. IT8ll20l 

Proof: This statement can be proven by combining the 
results in theorem [5] and [6] ■ 

By this theorem, we have proven the first conditions for 
the algorithm [3] to have convergence guarantee. The next 
subsection will give proofs for the second and the third 
condition. 

D. Limit points of sequence {B( fe ),C( fc )} 

Theorem 8: Any limit point of sequence |B' fc ), C^-*} gen- 
erated by algorithm [3] is a stationary point 

Proof: By theorem [7] algorithm [3] produces strictly de- 
creasing sequence jCB^ k ', C' fe )) until reaching a point that 
satisfies the KKT conditions. By update rules in eq. [25] and 
[26l after a point satisfies the KKT conditions, the algorithm 
will stop updating (B^ k \C^), i.e., B( fc+1 ) = B^ and 
Q(k+i) _ Q(k) -> where * is the iteration where the 
KKT conditions are satisfied. ■ 

Theorem 9: Sequence {B^), C^-* } has at least one limit 
point. 

Proof: As stated by Lin QjQ , it suffices to prove that 
is in a closed and bounded set. The boundedness 



The convergence guarantee of this sequence has been es- 
tablished by Oraintara et al. in ref. fl34) , |[43l . Here we will 
adopt their works and summarize the results in accord to our 
notations. 

Theorem 10 (Oraintara et al. j3?l/): The optimum j3 m cor- 
responding to the L-corner must satisfy 



1 2 

If 



C T bl 



1 2 

If' 



The similar condition can also derived for a n . 

(k) (k) 

Since the update rules for f3 m Vm and a n Vn are derived 
from this theorem, it implies that the update rules find the 
optimal solutions of these parameters for each iteration. 



Next we state the monotonicity property of sequence /3 



(k) 



and a 



(AO 



Lemma 2 (Oraintara et al. H34V ): The values of p m ' Vm 
either strictly increase or decrease under update rule eq. |27] 
unless converged to a limit point. The similar conditions also 
apply to oin^ Vn. 

And, the following theorem summarizes the convergence 
property of sequence and ■ 

Theorem 11 (Oraintara et al. I[34\l ): If the update rule 
eq. [27] converges, then it converges to the corresponding L- 
corner. The same condition also applies to the update rule 
eq.[28] 

Thus, lemma [2] and theorem fTTI state that while update rules 

eq. |27] and [28] generate monotonic updated values for {3$ Vm 

(AO 

and otn Vn which if the sequences converge, then they 
converge to the corresponding L-corner; there is no guarantee 
that the sequences will converge. Fortunately, the convergence 
can be established by choosing appropriate initial values. 

Directly from ref. 1134) . the followings summarize the strat- 
egy in choosing the initial values. 



(0) 



will 



1) If a„ is not in the range B, then choosing a 
produce an increasing sequence of converging to 
the nearest L-corner. 

2) If a„ is in the range of B, then it is always possible to 



choose > small enough so that J (an ) 



and convergence occurs. 
3) More generally, if \im ari . 



j(0) > 0, any initial value a„ produces a converging 



,J{a n )^/a n < 1 and 

(0) 



sequence a 
4) When linic^^oo J {a 
the update rule eq 



The 

B (k) 



/a n > 1, the only case when 
_ will not generate a converged 
sequence is when is chosen larger than the last 
intersection between J(a n ) an d the straight line a n = 
J{a n ) n . 

same conditions can also be derived for sequence 
Vm. Thus, by simply choosing a n = Vn and /3 m = 



Vm, the update rules eq. [27] and [28] will generate sequence 
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Pm Vra and a4 Vn that converge to the corresponding In- 
comers. 

F. The convergence of the solution sequence 

The following theorem gives the convergence guarantee of 
the solution sequence {'B < - k \C < - k \ f3^ k \ a^} generated by 
algorithm [3] 

Theorem 12: By choosing appropriate initial values 
(3$ Vm and Vn, algorithm [3] generates sequence 

{B( fc ),C( fc )} that converges to a point that satisfies the 
KKT optimality conditions (a stationary point), and sequence 
{/3^,a < - fe )} that converges to the corresponding L-corners. 

Proof: By the results of theorem|7]|8] and|9] we know that 
sequence {B( fc \C^ fc ^} converges to a point that satisfies the 
KKT conditions. By discussion in subsection IIV-EI we know 
that it is always possible to choose appropriate initial values for 
/3m ^ Vm and Vn so that sequence {(3^ k \ } converges 
to the corresponding L-corners. ■ 

V. Discussion on the convergence of algorithm[3] 

Algorithm [3] converges when it stops updating both se- 
quence {B (fc \C (fc) } and sequence {/3 (fe) , }. And as 
shown in eq. |27] and eq. |28l when {B<- k \cW} has con- 
verged then {(3^ k \ a( fe )} would also have converged. With 
the boundedness of B (fc) and C (fc) , /3 (fc) and a (fc) will also 
be bounded. Thus, algorithm [3] will be well-behaved through 
the update steps. This is an important fact since even though 
sequence J(B^ k ',C^ k n has the nonincreasing property, due 
to lemma |2] algorithm [3] may produce nondecreasing j(B^ k \ 
C( k \ f3^ k \ a( fc )) (we will refer this sequence as J(-))- This is 
because the nonincreasing property of J( ■ J should be shown 
by proofing that: 

1) j(B( fc+1 )) < J(B«), 

2) j(C( fc+1 )) < J(C«), 

3) j(/3< fc+1 )) < j(/3«), and 

4) J(a( fe+1 )) < j(a<*>). 

The first and second have been proven in theorem [5] and [6] 
respectively. But as stated in lemma El sequence /3 m Vm and 
a n Vn can either strictly increase or decrease until reaching 
the limit points, thus J(f3 (k+1) ) > J(/3 (fe) ) and/or J(c*( fc+1 )) 
> cases can occur. This, however, will not affect 

the convergence of algorithm [3] since as long as sequence 
{/3^ k ' ,a( fe )} converges, then the update rules eq. |251 and |261 
will eventually find the stationary point for {B'",C"}. 

Numerically, it seems that 7^ and 7„ play the key role in 
determining whether sequence {0} and {a [k) } will strictly 
increase or decrease with the big values lead to the increasing 
sequences and vice versa. 

VI. Conclusion 

We have presented a converged algorithm for NMF with 
Tikhonov regularization on its factor. There are two con- 
tributions that can be pointed out. The first is to show 
the connection between Tikhonov regularized linear inverse 
problems with constraint NMF which naturally leads to a 



mechanisme for determining the regularization parameter in 
the NMF automatically. And, the second is the development 
of a converged algorithm for NMF with Tikhonov regularized 
constraints. 

Appendix A 
Octave/Matlab codes for algorithm[3] 



function [B, C, newalpha, newbeta, iteration, 
olderror, errordiff, maxNablaB, 
maxNablaC, resNorm, solNorm] = 
TikhonovNMF3 (A, r, BO, CO, oldalpha, 
oldbeta, gammaB, gammaC, maxiter, tol) 

%The converged version of the algorithm 
%Use complementary slackness as stopping 
criterion 

format long; 

%%Check the input matrix 
if min (min (A) ) < 

error ('Input matrix cannot contain negative 
entries' ) ; 

return 
end 

[m,n] = size(A); 

%%Check input arguments 

if "exist (' A' ) 

error (' incorrect inputs!') 
end 

if "exist ( ' r ' ) 

error (' incorrect inputs!') 
end 

if "exist (' BO' ) 

BO = rand (m, r ) ; 
end 

if "exist (' CO' ) 

CO = rand (r, n) ; 
end 

if "exist (' alphaO' ) 

oldalpha = zeros (n,l); 
end 

if "exist ('betaO' ) 

oldbeta = zeros (m,l); 
end 

if "exist (' gammaB' ) 
gammaB = ones (m, 1 ) *0 . 1 ; %small values lead 
to better convergence property 

end 

if "exist (' gammaC ) 
gammaC = ones (n, 1 ) *0 . 1 ; %small values lead 
to better convergence property 

end 

if "exist (' maxiter ' ) 

maxiter = 1000; 
end 

if "exist (' tol' ) 

tol = 1.0e-9; 
end 

B = BO; clear BO; 
C = CO; clear CO; 
newalpha = oldalpha; 
newbeta = oldbeta; 
trAtA = trace (A' *A) ; 
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olderror = zeros (maxiter+1 , 1 ) ; 

olderror(l) = 0.5*trAtA - trace (C * (B' *A) ) + 
. 5*trace (C * (B' *B*C) ) + 
. 5*trace (B' *diag (newbeta) *B) + 
. 5*trace (C * (Odiag (newalpha) ) ) ; 

sigma = 1.0e-9; delta = sigma; 

for iteration=l :maxiter 
CCt = C*C ; 

gradB = B*CCt-A*C +diag (newbeta) *B; 
Bm = max(B, (gradB<0 ) *sigma) ; 
B = B - (Bm. *gradB. / (Bm*CCt + 
diag (newbeta) *Bm + delta)); 

BtB = B' *B; 

grade = BtB*C-B' *A+C*diag (newalpha) ; 
Cm = max (C, (gradC<0 ) *sigma) ; 
C = C - (Cm. *gradC . / (BtB*Cm + 
Cm*diag (newalpha) + delta)); 

for i = 1 : m 
newbeta (i) = gammaB ( i ) *norm ( (A (i , : ) - 

B(i, : ) *C) , ' fro' ) "2/ (norm(B (i, : ) , ' fro' ) "2 
+ delta) ; 

end 

for i = 1 : n 
newalpha (i) = gammaC ( i ) *norm ( (A ( : , i ) - 

B*C ( : , i) ) , ' fro' ) "2/ (norm(C ( : , i) , ' f ro' ) "2 
+ delta) ; 

end 

newerror = 0.5*trAtA - trace (C * (B' *A) ) + 
0.5*trace(C * (B' *B*C) ) + 
. 5*trace (B' *diag (newbeta) *B) + 
. 5*trace (C * (C*diag (newalpha) ) ) ; 

errordif f = abs (newerror - 

olderror (iteration) ) ; 
olderror ( iteration+1 ) = newerror; 
NablaB = (B*C*C - A*C + 

diag (newbeta) *B) . *B; 
NablaC = (B' *B*C - B' *A + 

C*diag (newalpha) ) . *C; 

maxNablaB = max (max (abs (NablaB) )) ; 
maxNablaC = max (max (abs (NablaC) )) ; 
if (maxNablaB < tol && maxNablaC < tol) 

break; 
end 

end 

resNorm = norm ( (A-B*C) , ' f ro' ) " 2 ; 
solNorm(l) = norm(B, ' fro' ) "2; 
solNorm(2) = norm(C, 'fro') "2; 



References 

[1] D. Lee and H. Seung, "Algorithms for non-negative matrix factorization," 
Proc. Advances in Neural Processing Information Systems, pp. 556-62, 
2000. 

[2] I.S. Dhillon and S. Sra, "Generalized nonnegative matrix approximation 
with Bregman divergences," UTCS Technical Reports, The University of 
Texas at Austin, 2005. 

[3] A. Cichocki, R. Zdunek, and S. Amari, "Csiszars divergences for non- 
negative matrix factorization: Family of new algorithms," Proc. 6th Int'I 



Conf. on Independent Component Analysis and Blind Signal Separation, 
2006. 

[4] P. Paatero and U. Tapper,"Positive matrix factorization: A non-negative 
factor model with optimal utilization of error estimates of data values," 
Environmetrics, Vol. 5, pp. 111-26, 1994. 

[5] P. Anttila, P. Paatero, and U. Tapper, "Source identification of bulk 
wet deposition in finland by positive matrix factorization," Atmospheric 
Environment, Vol. 29, No. 14, pp. 1705-18, 1995. 

[6] D. Lee and H. Seung, "Learning the parts of objects by non-negative 
matrix factorization," Nature, 401(6755), pp. 788-91, 1999. 

[7] P.O. Hoyer, "Non-negative matrix factorization with sparseness con- 
straints," The Journal of Machine Learning Research, Vol. 5, pp. 1457-69, 
2004. 

[8] S.Z. Li, X.W. Hou, H.J. Zhang, and Q.S. Cheng, "Learning spatially 
localized, parts-based representation," Proc. IEEE Comp. Soc. Conf. on 
Computer Vision and Pattern Recognition, pp. 207-12, 2001. 

[9] M. Berry, M. Brown, A. Langville, P. Pauca, and R.J. Plemmons, 
"Algorithms and applications for approximate nonnegative matrix fac- 
torization," Computational Statistics and Data Analysis, 2006. 

[10] F. Shahnaz, M.W. Berry, V. Pauca, and R.J. Plemmons, "Document clus- 
tering using nonnegative matrix factorization," Information Processing & 
Management, Vol. 42, No. 2, pp. 373-86, 2006. 

[11] W. Xu, X. Liu and Y. Gong, "Document clustering based on non- 
negative matrix factorization," Proc. ACM SIGIR, pp. 267-73, 2003. 

[12] VP Pauca, J. Piper, and R.J. Plemmons, "Nonnegative matrix factor- 
ization for spectral data analysis," Linear Algebra and Its Applications, 
Vol. 416, No. 1, pp. 29-47, 2006. 

[13] S. Jia and Y. Qian, "Constrained Nonnegative Matrix factorization for 
hyperspectral unmixing," IEEE Transactions on Geoscience and Remote 
Sensing, Vol. 47, No. 1, pp. 161-73, 2009. 

[14] A. Cichocki, S. Amari, R. Zdunek, R. Kompass, G. Hori, and Z. He, 
"Extended SMART algorithms for non-negative matrix factorization," 
Lecture Notes in Computer Science, Vol. 4029, pp. 548-62, 2006. 

[15] J. P. Branet, P. Tamayo, T.R. Golub, and J. P. Mesirov, "Metagenes 
and molecular pattern discovery using matrix factorization," Proc. Natl 
Acad. Sci. USA, Vol. 101, No. 12, pp. 4164-9, 2003. 

[16] Y. Gao and G. Church, "Improving Molecular cancer class discov- 
ery through sparse non-negative matrix factorization," Bioinformatics, 
Vol. 21, No. 21, pp. 3970-5, 2005. 

[17] H. Kim and H. Park, "Sparse non-negative matrix factorizations via 
alternating non-negativity constrained least squares for microarray data 
analysis," Bioinformatics, Vol. 23, No. 12, pp. 1495-502, 2007. 

[18] C.J. Lin, "On the convergence of multiplicative update algorithms for 
nonnegative matrix factorization," IEEE Transactions on Neural Net- 
works, Vol. 18, No. 6, pp. 1589-96, 2007. 

[19] A. Mirzal, Nonnegative Matrix Factorizations for Clustering and LSI, 
LAP Lambert Academic Publishing, 2011, chapter 4. 

[20] C. Ding, T. Li, W. Peng, and H. Park, "Orthogonal nonnegative matrix 
t-factorizations for clustering," Proc. 12th ACM SIGKDD Int'I Conf. on 
Knowledge Discovery and Data Mining, pp. 126-35, 2006. 

[21] J. Yoo and S. Choi, "Orthogonal nonnegative matrix factorization: Mul- 
tiplicative updates on Stiefel manifolds," Proc. 9th Int'I Conf. Intelligent 
Data Engineering and Automated Learning, pp. 140-7, 2008. 

[22] J. Yoo and S. Choi, "Orthogonal nonnegative matrix tri-factorization for 
co-clustering: Multiplicative updates on Stiefel manifolds," Information 
Processing & Management, Vol. 46, No. 5, pp. 559-70, 2010. 

[23] S. Choi, "Algorithms for orthogonal nonnegative matrix factorization," 
Proc. IEEE Int'I Joint Conf. on Neural Networks, pp. 1828-32, 2008. 

[24] H. Li, T. Adali, W. Wang, and D. Emge, "Non-Negative Matrix Factor- 
ization with Orthogonality Constraints for Chemical Agent Detection in 
Raman Spectra," Proc. IEEE Workshop on Machine Learning for Signal 
Processing, pp. 253-8, 2005. 

[25] C.J. Lin, "Projected gradient methods for non-negative matrix factoriza- 
tion," Technical Report ISSTECH-95-013, Department of CS, National 
Taiwan University, 2005. 

[26] E.F. Gonzales and Y. Zhang, "Accelerating the Lee-Seung algorithm 
for non-negative matrix factorization," Technical Report, Dept. Corn- 
put. Appl. Math., Rice Univ., Houston, 2005. 

[27] D. Kim, S. Sra, and I.S. Dhillon, "Fast projection-based methods for 
the least squares nonnegative matrix approximation problem," Stat. Anal. 
Data Min., Vol. 1, No. 1, pp. 38-51, 2008. 

[28] D. Kim, S. Sra, I.S. Dhillon, "Fast newton-type methods for the 
least squares nonnegative matrix approximation problem," Proc. SIAM 
Conference on Data Mining, pp. 343-54, 2007. 

[29] H. Kim and H. Park, "Nonnegative matrix factorization based on 
alternating nonnegativity constrained least squares and active set method," 
SIAM. J. Matrix Anal. & Appl, Vol. 30, No. 2, pp. 713-30, 2008. 



11 



[30] J. Kim and H. Park, "Toward faster nonnegative matrix factorization: A 
new algorithm and comparisons," Proc. 8th IEEE International Confer- 
ence on Data Mining, pp. 353-62, 2008. 

[31] R. Tibshirani, "Regression shrinkage and selection via the lasso," 
/. Royal Statistical Society B, Vol. 58, Issue 1, pp. 267-88, 1996. 

[32] PC. Hansen, The L-curve and its use in the numerical treatment of 
inverse problems, Computational Inverse Problems in Electrocardiology, 
WIT Press, 2000, pp. 119-142. 

[33] PR. Johnston and R.M. Gulrajani, "A new method for regularization 
parameter determination in the inverse problem of electrocardiography," 
IEEE Transaction on Biomedical Engineering, Vol. 44, No. 1, pp. 19-39, 
1997. 

[34] S. Oraintara, W.C. Karl, D.A. Castanon, and T.Q. Nguyen, "A method 
for choosing the regularization parameter in generalized tikhonov regu- 
larized linear inverse problems," Proc. International Conference of Image 
Processing, pp. 93-96, 2000. 

[35] S.M. Tan and C. Fox, "Regularization methods for linear inverse 
problems," Lecture Note: PHYSICS 707 Inverse Problems, Chapter 3, 
pp. 1-15, 1998. 

[36] K. Kunisch and J. Zhou, "Iterative choices of regularization parameters 
in linear inverse problems," Inverse Problems, Vol. 14, No. 5, pp. 1247- 
64, 1998. 

[37] M. Beige, E. Miller, and M. Kilmer, "Simultaneous multiple regulariza- 



tion parameter selection by means of the L-hypersurface with applications 
to linear inverse problems posed in the wavelet transform domain," SPIE 
hit' I Symposium on Optical Science, Engineering, and Instrumentation: 
Bayesian Inference for Inverse Problems, pp. 328-36, 1998. 

[38] T. Reginska, "A regularization parameter in discrete ill-posed problems," 
SIAM Journal of Scientific Computing, Vol. 17, No. 3, pp. 740-9, 1996. 

[39] D.L. Phillips, "A technique for the numerical solution of certain integral 
equations of the first kind," J. Association for Computing Machinery, 
Vol. 9, Issue 1, pp. 84-97, 1997. 

[40] A.N. Tikhonov, "Solution of incorrectly formulated problems and the 
regularization method," Soviet Math. Dokl. 4, pp. 1035-8, 1963. English 
translation of Dokl. Akad. Nauk. SSSR, 151, pp. 501-4, 1963. 

[41] H. Hindi, "A tutorial on convex optimization," Proc. American Control 
Conference, pp. 3252-65, 2004. 

[42] D.P. Bertsekas, "Nonlinear Programming 2nd Ed.," Athena Scientific, 
1999. 

[43] S. Oraintara, "Regular Linear Phase Perfect Reconstruction Filter Banks 
for Image Compression," PhD Thesis, Boston University, 2000, Appendix 
A. 

[44] L. Grippo and M. Sciandrone, "On the convergence of the block 
nonlinear Gauss-Seidel method under convex constraints," Operation 
Research Letters, Vol. 26, pp. 127-36, 2000. 



